Stylometric Authorship Attribution of Collaborative Documents
نویسندگان
چکیده
Stylometry is the study of writing style based on linguistic features and is typically applied to authorship attribution problems. In this work, we apply stylometry to a novel dataset of multi-authored documents collected from Wikia using both relaxed classification with a support vector machine (SVM) and multi-label classification techniques. We define five possible scenarios and show that one, the case where labeled and unlabeled collaborative documents by the same authors are available, yields high accuracy on our dataset while the other, more restrictive cases yield lower accuracies. Based on the results of these experiments and knowledge of the multi-label classifiers used, we propose a hypothesis to explain this overall poor performance. Additionally, we perform authorship attribution of pre-segmented text from the Wikia dataset, and show that while this performs better than multi-label learning it requires large amounts of data to be successful.
منابع مشابه
Deep Sentence-Level Authorship Attribution
We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features base...
متن کاملStyle based Authorship Attribution on English Editorial Documents
The aim of the authorship attribution is identification of the author/s of unknown document(s). Every author has a unique style of writing pattern. The present paper identifies the unique style of an author(s) using lexical stylometric features. The lexical feature vectors of various authors are used in the supervised machine learning algorithms for predicting the unknown document. The highest ...
متن کاملClassify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution
Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document’s author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are i...
متن کاملInvestigating Topic Influence in Authorship Attribution
The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus ...
متن کاملStylometry and collaborative authorship: Eddy, Lovecraft, and 'The Loved Dead'
The authorship of the 1924 short story ‘The Loved Dead’ has been contested by family members of Clifford Martin Eddy, Jr. and Sunand Tryambak Joshi, a leading scholar on Howard Phillips Lovecraft. The authors of this article use stylometric methods to provide evidence for a claim about the authorship of the story and to analyze the nature of Eddy’s collaboration with Lovecraft. Further, we exte...
متن کامل